LLM Security, Vulnerabilities, and Models
How the models work
Open Worldwide Application Security Project (OWASP) Top 10 for Large Language Model Applications
the potential security risks when deploying and managing Large Language Models (LLMs). The project provides a list of the top 10 most critical vulnerabilities often seen in LLM applications, highlighting their potential impact, ease of exploitation, and prevalence in real-world applications. Examples of vulnerabilities include prompt injections, data leakage, inadequate sandboxing, and unauthorized code execution, among others. The goal is to raise awareness of these vulnerabilities, suggest remediation strategies, and ultimately improve the security posture of LLM applications.
OpenAI has a Model Spec intended to precisely outline how its models work. Similar to Anthropic’s Constitutional AI, it’s a series of internal prompts and other measures that guide answers toward “ethical” and more helpful results.
For example:
- Follow the chain of command
- Comply with applicable laws
- Don’t provide information hazards
- Respect creators and their rights
- Protect people’s privacy
- Don’t respond
Anthropic goes further, attempting to model how its Claude product works and showing precisely the chain of “neurons” that are activated in responses that include words like “Golden Gate Bridge” or when discussing gender bias.